RedRibbon: A new rank–rank hypergeometric overlap for gene and transcript expression signatures

RedRibbon is a comparative analysis tool of differential omics analyses to reveal overlapping features between two differential studies, with high performance, accuracy, and simplicity in use.

To demonstrate RedRibbon's basic workflow, we will first generate and use a synthetic data set.Next, the tool will be applied on real data sets.We recommend that you read first this simple experimental section to familiarize yourself with the tool before applying it to your real data.Each of the functions or methods described in this R vignette is fully documented.This documentation can be accessed the usual way in R with the question mark prefixing the function name (e.g., ?RedRibbon.data.frame).

Synthetic data set
The synthetic data set is composed of two labeled lists of numbers.A data.frame, called df, will contain these lists.This data.frame is composed of columns a and b, the two lists of numbers and id, a list of labels.Those columns are mandatory to run RedRibbon rank-rank hyper-geometric overlap.The synthetic lists have one quarter perfectly overlapping at the bottom of the list, one quarter at the top of the list, and the remaining elements are random.

Overlapping
We will now compute an overlap with RedRibbon for this synthetic data set.First, the library is loaded with library(RedRibbon) #> Loading required package: scales #> Loading required package: ggrepel #> Loading required package: ggplot2 #> Loading required package: data.table Then, we create an S3 RedRibbon object with the RedRibbon function, rr <-RedRibbon(df, enrichment_mode="hyper-two-tailed") We set the parameter enrichment_mode to use two-tailed test.This way, we will simultaneously detect enrichment in correlated (up/up and down/down) and anti-correlated (up/down and down/up) genes.
Next, we localize the minimal P-value in the four quadrants with the quadrants method on the RedRibbon object, quad <-quadrants (rr, algorithm="ea", permutation=TRUE, whole=FALSE) We use the evolutionary algorithm (algorithm="ea") to locate the minimal P-value coordinates with the highest accuracy).We also ask the computation of adjusted P-value with the permutation option to TRUE.whole=FALSE splits the overlap map into four quadrants (i.e., down-down, up-up, down-up, up-down).

Plotting the level map
Results can be plotted with the helper function ggRedRibbon.The overlap map of the two lists goes respectively in the same direction (both downregulated or both upregulated in the 2 lists).The maximal log P-values and the permutation adjusted P-values are shown for each quadrant.The horizontal and vertical dotted lines split the downregulation and upregulation where the log fold change is zero.The code to produce the map is straightforward, gg <-ggRedRibbon(rr, quadrants=quad, repel.force= 250) + coord_fixed(ratio = 1, clip = "off") gg In the overlap map, the hypergeometric P-value are signed negatively for depletion (anti-correlation) and positive for enrichment (correlation).
The gg variable contains a standard ggplot2 object,

Compatibility mode
This mode only exists for compatibility with RRHO.Please use the new interface shown above when possible.

Analysis with real data sets
To demonstrate the versality and potential of RedRibbon, we will apply it to diverse omic data sets.The method is agnostic to the underlying technology.Here, this robustness will be illustrated with RNA-Seq transcriptome (quantified at gene and transcript level), micro-array, and proteome.

Overlap of transcriptomic analyses
To demonstrate the capabilities of RedRibbon, we will use data from GSE159984 GEO submission.This data set has been analyzed in depth in the RedRibbon main manuscript.The data contains the output from two DESeq2 differential expression analyses.In this case, columns a and b contain log2FoldChange values corresponding to type 2 diabetes islet signatures (T2DvsCTL) and human islets treated with palmitate and glucose signatures (D8PGsCTL).
Then, ggRedRibbon is used to plot the overlap map.

Long read analysis compared to short read analysis
RedRibbon is capable of handling both long and short read RNA-Seq differential analyses.To illustrate this, we will compare the long (GSE167223) and short read (GSE137136) signatures of pancreatic beta cells exposed to cytokines.This data set has also been analyzed in the RedRibbon main manuscript.

sessionInfo()
Considering gene correlations, the adjusted P-value of the overlap is not significant.Hence, no minimal coordinate is shown on the map.